Introduction: Breast cancer

<<<<<<< HEAD =======


>>>>>>> cc2733eb72bdfa5b3a273def76a19ca51d1b3dc1

EU Science Hub

  • 13.3% of all cancer diagnoses.
  • 7.1% of all cancer related deaths.

WHO Guide to cancer early diagnosis

  • Early diagnosis and prognosis may aid treatment and focus resources where needed.

Materials and methods

Materials

<<<<<<< HEAD =======


>>>>>>> cc2733eb72bdfa5b3a273def76a19ca51d1b3dc1

Kaggle breast cancer data

26 explanatory variables.

  • Condition
  • tumor type
patient_id gender education treatment_data id_healthcenter id_treatment_region hereditary_history birth_date age weight thickness_tumor marital_status marital_length pregnency_experience giving_birth age_FirstGivingBirth abortion blood taking_heartMedicine taking_blood_pressure_medicine taking_gallbladder_disease_medicine smoking alcohol breast_pain radiation_history Birth_control(Contraception) menstrual_age menopausal_age Benign_malignant_cancer condition
111036008041 0 4 2019 1.11e+09 1.11e+09 1 1989 30 69 0.90 1 0 0 0 0 0 4 0 1 1 0 0 1 1 1 1 0 1 death
111035996130 0 6 2019 1.11e+09 1.11e+09 0 1989 30 71 0.80 0 0 0 0 0 0 1 1 1 1 0 1 0 0 0 2 0 0 death
111035971333 0 5 2019 1.11e+09 1.11e+09 0 1989 30 74 0.90 1 0 0 0 0 1 4 1 1 0 0 0 1 1 0 1 0 1 death
111036018485 0 5 2019 1.11e+09 1.11e+09 1 1989 30 75 0.70 1 1 1 3 1 0 2 1 1 1 1 0 0 0 0 2 0 0 death
111035985474 0 1 2019 1.11e+09 1.11e+09 0 2009 10 70 0.25 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 death
111035903616 0 3 2019 1.11e+09 1.11e+09 1 1989 30 79 0.70 0 0 0 0 0 0 6 1 1 1 0 1 1 1 1 1 0 1 death
111036003507 0 4 2019 1.11e+09 1.11e+09 1 1990 29 96 0.10 0 0 0 0 0 0 4 1 1 0 0 0 1 1 0 2 0 1 death
111036026259 0 5 2019 1.11e+09 1.11e+09 0 1990 29 75 0.80 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 2 0 0 death

Process

<<<<<<< HEAD =======


>>>>>>> cc2733eb72bdfa5b3a273def76a19ca51d1b3dc1

Cleaning the data

<<<<<<< HEAD =======


>>>>>>> cc2733eb72bdfa5b3a273def76a19ca51d1b3dc1

Unrealistic age and weight proportions.

Cleaning the data

<<<<<<< HEAD =======


>>>>>>> cc2733eb72bdfa5b3a273def76a19ca51d1b3dc1
Changes
Column names
Removed special characters from variable names
Variable values
Greedy cleanup of binary variables: None binary values set to 1
Blood type had non-conforming entries which were set to NA
Birth dates of other than 4 numbers were set to NA
Filter out samples
Only include women as they are the risk group of breast cancer (few men are hit)
Only include women > 20 years, as they are the primary risk group
Remove samples with abnormal weight age proportions
Filter out a single woman who is set to not yet have her period, but have experience in pregnancy
Removing columns
Remove singular columns (with only one value for all samples)

Augmenting the data

<<<<<<< HEAD =======


>>>>>>> cc2733eb72bdfa5b3a273def76a19ca51d1b3dc1
Changes
Values changed
Categorical variables encoded with label as factor
Adding columns
Age at treatment
Normalised numerical variables

Exploratory Analysis

Distributions

Distributions

Distributions

Distributions

MCA Tumortype

MCA Condition

MCA Rotation

Binary correlation

Numeric correlations

Model

Predicting tumortype


Reduced Model

  • Age (norm)
  • Weight (norm)
  • Hereditary history
  • Smoking
  • Radiation therapy
  • Menstrual age
  • Pregnancy experience
  • Abortion
  • Breast pain
model sensitivity specificity balanced_accuracy
Max_pred 69% 28% 48%
Red_pred 81% 24% 53%
baseline 100% 0% 50%
Note:
Positive class = Malignant

Predicting Condition

<<<<<<< HEAD =======


>>>>>>> cc2733eb72bdfa5b3a273def76a19ca51d1b3dc1

Reduced Model

  • Age
  • Gallbladder medicine
  • Menopausal age
  • Abortion
model sensitivity specificity balanced_accuracy
Max_pred 91% 42% <<<<<<< HEAD 65% ======= 67% >>>>>>> cc2733eb72bdfa5b3a273def76a19ca51d1b3dc1
Red_pred 90% <<<<<<< HEAD 48% ======= 39% >>>>>>> cc2733eb72bdfa5b3a273def76a19ca51d1b3dc1 <<<<<<< HEAD 69% ======= 64% >>>>>>> cc2733eb72bdfa5b3a273def76a19ca51d1b3dc1
baseline 100% 0% 50%
Note:
Positive class = Death
<<<<<<< HEAD

Shiny app

Shiny App

======= >>>>>>> cc2733eb72bdfa5b3a273def76a19ca51d1b3dc1

Discussion & Conclusion

Discussion

<<<<<<< HEAD =======


>>>>>>> cc2733eb72bdfa5b3a273def76a19ca51d1b3dc1
  • Greedy cleaning approach
  • Disagreement between MCA and reduced model
  • A general set of rules for valid entries

Conclusion

<<<<<<< HEAD =======


>>>>>>> cc2733eb72bdfa5b3a273def76a19ca51d1b3dc1
  • Possible to predict both tumor type and outcome.
  • The prediction accuracy.
  • Shiny app

Bibliography

<<<<<<< HEAD =======


>>>>>>> cc2733eb72bdfa5b3a273def76a19ca51d1b3dc1

Shiny app

Shiny App